Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model

نویسندگان

Saikat Mukherjee

Ravi Prakash

Rajagopal Ananthanarayanan

Karan Gupta

Prashant Pandey

Himabindu Pucha

Prasenjit Sarkar

Mansi Shah

Mark Dredze

Alex Kulesza

Fay Chang

Jeffrey Dean

Sanjay Ghemawat

Wilson C. Hsieh

Deborah A. Wallach

Michael Burrows

Tushar Chandra

Andrew Fikes

Arthur Asuncion

Padhraic Smyth

Brian F. Cooper

Adam Silberstein

Erwin Tam

Raghu Ramakrishnan

Olivier Bousquet

Giuseppe DeCandia

Deniz Hastorun

Madan Jampani

Gunavardhan Kakulapati

Avinash Lakshman

Alex Pilchin

Swami Sivasubramanian

Peter Vosshall

چکیده

Quantifiability is a concept in MapReduce Analytics based on the following two conditions: (a) a mapper should be cautious, that is, should not exclude any reducer's shuffle and sort strategy from consideration; and (b) a mapper should respect the reducers' shuffle and sort preferences, that is, should deem a reducer's shuffle and sort strategy ki infinitely more likely than k'i if it premises the reducer to prefer ki to k'i. A shuffle and sort strategy is quantifiable if it can optimally be chosen under common shuffle and sort conjecture in the events (a) and (b). In this paper we present an algorithm that for every finite MapReduce operation computes the set of all quantifiable shuffle and sort strategies. The algorithm is based on the new idea of a key-value preference limitation, which is a pair (ki, Vi) consisting of a shuffle and sort strategy ki, and a subset of shuffle and sort strategies Vi, for mapper i. The interpretation is that mapper i prefers some shuffle and sort strategy in Vi to ki. The algorithm proceeds by successively adding key-value preference limitations to the MapReduce.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Traffic Analysis in MapReduce

-MapReduce is a programming model, which can process the large set of data and produces the output. The MapReduce contains two functions to complete the work, those are Map function and Reduce function. The Map function will get assign fragmented data as input and then its emit intermediate data with key and send to this intermediate data with key to the Reducer, where Reducer will get the inpu...

متن کامل

Asymmetric Key-Value Split Pattern Assumption over MapReduce Behavioral Model

Actual Quantifiability is a concept in MapReduce that is based on two assumptions: (1) every mapper is cautious, i. e. , does not exclude any reducer's key-value split pattern choice from consideration, and (2) every mapper respects the reducer's key-value split pattern preferences, i. e. , deems one reducer's key-value split pattern choice to be infinitely more likely than anoth...

متن کامل

Optimization and analysis of large scale data sorting algorithm based on Hadoop

When dealing with massive data sorting, we usually use Hadoop which is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. A common approach in implement of big data sorting is to use shuffle and sort phase in MapReduce based on Hadoop. However, if we use it directly, the efficiency could be very low and the loa...

متن کامل

MapReduce with communication overlap (MaRCO)

MapReduce is a programming model from Google for cluster-based computing in domains such as search engines, machine learning, and data mining. MapReduce provides automatic data management and fault tolerance to improve programmability of clusters. MapReduce’s execution model includes an all-map-to-all-reduce communication, called the shuffle, across the network bisection. Some MapReductions mov...

متن کامل

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

MapReduce and Spark are two very popular open source cluster computing frameworks for large scale data analytics. These frameworks hide the complexity of task parallelism and fault-tolerance, by exposing a simple programming API to users. In this paper, we evaluate the major architectural components in MapReduce and Spark frameworks including: shuffle, execution model, and caching, by using a s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model

نویسندگان

چکیده

منابع مشابه

Traffic Analysis in MapReduce

Asymmetric Key-Value Split Pattern Assumption over MapReduce Behavioral Model

Optimization and analysis of large scale data sorting algorithm based on Hadoop

MapReduce with communication overlap (MaRCO)

Clash of the Titans: MapReduce vs. Spark for Large Scale Data Analytics

عنوان ژورنال:

اشتراک گذاری